Goto

Collaborating Authors

 foreground group



Contrastive dimension reduction: when and how?

Neural Information Processing Systems

Dimension reduction (DR) is an important and widely studied technique in exploratory data analysis. However, traditional DR methods are not applicable to datasets with with a contrastive structure, where data are split into a foreground group of interest (case or treatment group), and a background group (control group). This type of data, common in biomedical studies, necessitates contrastive dimension reduction (CDR) methods to effectively capture information unique to or enriched in the foreground group relative to the background group. Despite the development of various CDR methods, two critical questions remain underexplored: when should these methods be applied, and how can the information unique to the foreground group be quantified? In this work, we address these gaps by proposing a hypothesis test to determine the existence of contrastive information, and introducing a contrastive dimension estimator (CDE) to quantify the unique components in the foreground group. We provide theoretical support for our methods and validate their effectiveness through extensive simulated, semi-simulated, and real experiments involving images, gene expressions, protein expressions, and medical sensors, demonstrating their ability to identify the unique information in the foreground group.


Contrastive Dimension Reduction: A Systematic Review

Hawke, Sam, Zhang, Eric, Chen, Jiawen, Li, Didong

arXiv.org Machine Learning

Contrastive dimension reduction (CDR) methods aim to extract signal unique to or enriched in a treatment (foreground) group relative to a control (background) group. This setting arises in many scientific domains, such as genomics, imaging, and time series analysis, where traditional dimension reduction techniques such as Principal Component Analysis (PCA) may fail to isolate the signal of interest. In this review, we provide a systematic overview of existing CDR methods. We propose a pipeline for analyzing case-control studies together with a taxonomy of CDR methods based on their assumptions, objectives, and mathematical formulations, unifying disparate approaches under a shared conceptual framework. We highlight key applications and challenges in existing CDR methods, and identify open questions and future directions. By providing a clear framework for CDR and its applications, we aim to facilitate broader adoption and motivate further developments in this emerging field.



Contrastive dimension reduction: when and how?

Neural Information Processing Systems

Dimension reduction (DR) is an important and widely studied technique in exploratory data analysis. However, traditional DR methods are not applicable to datasets with with a contrastive structure, where data are split into a foreground group of interest (case or treatment group), and a background group (control group). This type of data, common in biomedical studies, necessitates contrastive dimension reduction (CDR) methods to effectively capture information unique to or enriched in the foreground group relative to the background group. Despite the development of various CDR methods, two critical questions remain underexplored: when should these methods be applied, and how can the information unique to the foreground group be quantified? In this work, we address these gaps by proposing a hypothesis test to determine the existence of contrastive information, and introducing a contrastive dimension estimator (CDE) to quantify the unique components in the foreground group. We provide theoretical support for our methods and validate their effectiveness through extensive simulated, semi-simulated, and real experiments involving images, gene expressions, protein expressions, and medical sensors, demonstrating their ability to identify the unique information in the foreground group.